Skip to content

Conversation

jialei777
Copy link
Collaborator

@jialei777 jialei777 commented Jul 24, 2025

  • onboarded a shallow version deepseek v3 model for local test and e2e test
  • onboarded a full size deepseek v3 for training with xpk cluster
  • update the flops computation for deepseek (i.e., mla and moe logic)

MFU:
deepseek v3 shallow: ~11%
deepseek v3 : ~4%

@jialei777 jialei777 marked this pull request as ready for review August 11, 2025 18:09
@jialei777 jialei777 requested a review from liurupeng August 11, 2025 18:15
@jialei777 jialei777 merged commit 28a819f into main Aug 11, 2025
17 of 20 checks passed
@jialei777 jialei777 deleted the jialei/deepseek-v3-1 branch August 11, 2025 19:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants